Mobile document capture assist for optimized text recognition

ABSTRACT

A device and method for providing a visual cue for improved text imaging on a mobile device. The method includes determining a minimum text size for accurate optical character recognition (OCR) of an image captured by the mobile device, receiving an image stream of a printed substrate, and displaying the image stream and a visual cue superimposed onto the image stream, wherein the visual cue is indicative of the minimum text size. The method further includes capturing a digital image of the image stream, wherein the digital image does not include the visual cue. Additionally, the method further includes notifying a user of the mobile device when text displayed within the image stream is at least as large as the minimum text size.

BACKGROUND

Mobile devices, such as smartphones, tablet computers, and other similarcomputing devices, are increasingly being used for capturing andprocessing data such as images and text. Typically, a mobile deviceincludes a high quality camera that can be used to capture images ofprinted documents. For example, a customer may be asked to print andfill out a form, and send a digital copy of the completed form to aspecific vendor for further processing. The user may opt to capture animage of the form with their mobile device. Software installed on thedevice may then further process the captured image. For example, thesoftware may be configured to enhance, recognize, store and share theimages of printed documents. Continuing the above example, the user maystore the captured image of the form and transmit the image to thevendor.

The mobile device used to capture the image of the printed document maybe limited by software and hardware components within the device. Forexample, the mobile device's camera may contain a camera lens that has afocal length that prevents the lens from accurately focusing on anobject a short distance away. When such a limitation exists, a userholding the device too closely to the printed document may not be ableto properly focus upon the printed document.

Conversely, for the purposes of text quality, if a user is too far fromthe printed document, the text may be distorted, of insufficientresolution, or otherwise illegible, thereby reducing the overall qualityof the captured text. This reduced quality can adversely affect opticalcharacter recognition (OCR) and other recognition algorithms. Qualitycorrection can be applied after capture; however, the resolution that isinherently lost as a result of the poor image quality can never berecovered perfectly via digital processing after the image is captured.

SUMMARY

In one general respect, the embodiments disclose a method of providing avisual cue for improved text imaging on a mobile device. The methodincludes determining a minimum text size for accurate optical characterrecognition (OCR) of an image captured by the mobile device, receivingan image stream of a printed substrate, and displaying the image streamand a visual cue superimposed onto the image stream, wherein the visualcue is indicative of the minimum text size.

In another general respect, the embodiments disclose a mobile device.The mobile device includes a processing device, a display operablyconnected to the processing device and having an associated displayresolution, an image capture device operably connected to the processingdevice and having an associated image capture resolution, and a computerreadable medium in communication with the processing device. Thecomputer readable medium comprising one or more programming instructionsfor causing the processing device to determine a minimum text size foraccurate optical character recognition (OCR) of an image captured by themobile device, receive an image stream of a printed substrate ascaptured by the image capture device, and display the image stream and avisual cue superimposed onto the image stream on the display, whereinthe visual cue is indicative of the minimum text size.

BRIEF DESCRIPTION OF THE FIGURES

FIG. 1 depicts a first or front face of a mobile device that includes acamera configured to capture an image according to an embodiment.

FIG. 2 depicts a second or rear face of a mobile device that includes acamera configured to capture an image according to an embodiment.

FIG. 3 depicts an example of a printed document to be captured accordingto an embodiment.

FIG. 4A depicts an example of a screenshot of a displayed documentincluding an example of a visual marker according to an embodiment.

FIG. 4B depicts a second example of a screenshot of a displayed documentincluding the visual marker according to an embodiment.

FIG. 5 depicts a flow chart example of a process for viewing andcapturing an image of a printed document using visual markers tooptimize text recognition according to an embodiment.

FIG. 6 depicts various embodiments of a computing device forimplementing the various methods and processes described herein.

DETAILED DESCRIPTION

This disclosure is not limited to the particular systems, devices andmethods described, as these may vary. The terminology used in thedescription is for the purpose of describing the particular versions orembodiments only, and is not intended to limit the scope.

As used in this document, the singular forms “a,” “an,” and “the”include plural references unless the context clearly dictates otherwise.Unless defined otherwise, all technical and scientific terms used hereinhave the same meanings as commonly understood by one of ordinary skillin the art. As used in this document, the term “comprising” means“including, but not limited to.”

For the purposes of this document, a “printed document” or “document”refers to a hardcopy of one or more pieces of printed substratescontaining a combination of text and/or images. For example, thedocument may be a form, a page from a book or other publication, aposter, a billboard or another similar form of advertising, and anyother printed surface.

A “mobile device” refers to a portable computing device that includes animage capturing device, a processor and tangible, computer-readablememory. The memory may contain programming instructions in the form of asoftware application that, when executed by the processor, causes thedevice to perform one or more image acquisition and processingoperations according to the programming instructions. Examples ofsuitable devices include portable electronic devices such assmartphones, personal digital assistants, cameras, tablet devices,electronic readers, personal computers, media players, satellitenavigation devices and the like.

An “imaging device” refers to any device capable of optically viewing anobject and converting an interpretation of that object into electronicsignals. One such example of an imaging device is a camera. An “imagecapture module” refers to the software application and/or the imagesensing hardware of an electronic device that is used to capture images.

An “optical character recognition (OCR) engine” refers to one or moresoftware applications configured to digitally convert images of capturedtext into machine-encoded text.

“Augmented reality” (AR) refers to the concept of combining a real sceneviewed by a user with a virtual scene overlay generated by a computingdevice that augments the real scene with additional useful information.

The use of mobile devices for scanning printed documents comes withunique opportunities as well as unique challenges. Advantages includelow-cost, portability, ubiquity, increasing computational power, and theintegration of multiple imaging devices and image capture modules into asingle mobile device. Challenges include the variety of captureconditions, including lighting variation, geometry and spacing of theobject being captured, motion caused blur, and other factors that canaffect image quality. As described herein, various concepts related toAR may be incorporated into an image capturing process as used by amobile device to improve the quality of printed document images capturedby the mobile device so as to provide improved text recognition from thecaptured image.

Various aspects of AR may be integrated into the image capturing processto reduce problems associated with low quality text capture, which cangreatly affect the quality of an image of a printed document for use inOCR and other recognition applications. Using AR techniques, a textualor other similar visual cue may be displayed to a user of a mobiledevice, the visual cue providing a visual reference of a suitable textsize for accurate OCR processing.

In image capture, for a given sensor/optics configuration integrated inan imaging device, there is a linear relationship between capturedistance (or optical zoom) and image resolution (as measured, forexample, by average text character width in pixels). Thus, whencapturing an image of text, the small the text in the image the lowerthe resolution of the text. In turn, there exists a proportional (thoughnonlinear) relationship between image resolution and OCR accuracy asobtained by a given OCR engine.

Typically, OCR performance and accuracy degrades rapidly when characterresolution falls below a certain threshold. In experiments performedwith state of the art OCR engines such as ABBYY or Nuance, thisthreshold resolution is approximately 16 pixels per character. Fortypical modern smartphones, this resolution is achieved when capturingprinted text with a 12 point font size at a distance of approximately 12inches. Thus, in order to capture sufficient character resolution forobtaining high accuracy OCR, a minimum text size of no fewer than 16pixels per character is preferable.

It should be noted that 16 pixels per character is shown as an exampleonly. The minimum sizing requirements may vary based upon the type ofdevice being used to capture the text image, as well as the OCR enginebeing used and any operational parameters associated with the OCRengine. However, a visual cue may be determined based upon the minimumacceptable sizing requirements for recognizing text according to thetechniques as described herein regardless of the capabilities of theimage sensor and OCR engine.

FIG. 1 shows one example of a mobile device, generally designated 100.The mobile device 100 may include a front face 102 and a display 104.The display 104 may be any suitable component for displaying images,including, but not limited to, electroluminescent displays, electronicpaper displays, vacuum fluorescent displays, light emitting diode (LED)displays, cathode ray tube (CRT) displays, liquid crystal (LCD)displays, plasma display panels, digital light processing (DLP)displays, and organic light-emitting diode (OLED) displays. The display104 may further include a touch sensitive screen, including, but notlimited to, resistive touchscreens, capacitive touchscreens, andinfrared touchscreens.

FIG. 2 shows a rear face 202 of the mobile device 100. The rear face 202may include an imaging device 204. The imaging device 204 may be anysuitable component capable of receiving an optical image andtransmitting the information to other components for processing.

The imaging device may further have an ability to adjust its focallength and aperture in such a manner that would allow it to zoom andproperly focus upon an intended object to be imaged. This adjustment maydefine an “optimal focal distance,” or a range of distances in which themobile device 100 may be properly positioned from the intended object tobe imaged to achieve a clear image.

While the imaging device 204 is depicted on the rear face of the presentexample, persons skilled in the art will appreciate that the imagingdevice 204 may be positioned at any location upon any face of the mobiledevice 100, or may even be external to the mobile device 100 andconnected by any means of electronic communication, including, but notlimited to, physical cable communication such as universal serial bus(USB), wireless radio communication, wireless light communication, ornear field communication technology.

In some embodiments, the display 104 may be positioned within the mobiledevice 100, and may be configured in such a way so as to display theoutput of the imaging device 204 in real time so that the user may viewthe display 104 and see the output of the imaging device 204 on thedisplay.

Accordingly, the configuration of the mobile device 100 as shown inFIGS. 1 and 2 is only an example, and persons skilled in the art willappreciate other configurations that are able to achieve a similaroverall result.

The mobile device 100 may be used to capture an image of a printeddocument 300, as shown in FIG. 3. The mobile device 100 may bepositioned so that the imaging device 204 is facing the printed document300 desired to be imaged. Preferably, but not necessarily, the printeddocument 300 is placed on a flat (but not necessarily horizontal)surface as is shown in FIG. 3. The imaging device 204 may be activatedto view the printed document 300, wherein the mobile device 100 maycapture and render an image depicting the printed document 300 upon thedisplay 104 by use of an image capture module.

In accordance with the present disclosure, a visual cue may be overlaidonto an image of a text document being captured for the purposes ofensuring high quality capture of text images. As shown in FIG. 4A, theimage capture module or another similar software module may include avisual cue 402 superimposed onto a display 400 of text 404 beingcaptured. It should be noted that, as used herein, the term “text”refers to any alphanumeric characters. To enable the visual cue 402, theuser of mobile device 100 may launch or otherwise access an applicationspecifically intended for capturing an image of text. The applicationmay superimpose the visual cue 402 onto the display 400. Alternatively,an image capture application may detect that the image includes text,and superimpose the visual cue 402 automatically without additional userinput.

It should be noted that, as shown in FIG. 4A (and FIG. 4B as describedbelow), the visual cue 402 is a text string reading “TEXT SIZE GUIDE.”This is shown by way of example only, and additional visual cues may beused. For example, a visual cue may include a geometric shape such as arectangular bounding box indicating minimum height or width for textbeing captured. Alternatively, the visual cue may be an interactive cuethat changes based upon a user's actions. For example, the cue may be acolored circle in a corner of the display. The cue may remain red untilthe text is an acceptable size. Once the text is determined to be anacceptable size (i.e., above a minimum width or height in pixels foreach character), the cue may change colors e.g., from red to green.

Referring again to FIG. 4A, as the user moves the device 100, or altersthe zoom of the device, the visual cue 402 may provide a reference foracceptable text character size for ensuring that the quality of anycaptured text is suitable for OCR. As shown in FIG. 4B, as the use zoomsin on the text 404, either by moving the device 100 closer to thedocument or using an optical zoom, the text 404 increases in size.However, the visual cue 402 remains at a constant size, therebyproviding the user with a reference for properly sizing the text 404.

It should be noted that, as shown in FIGS. 4A and 4B, the visual cue 402is positioned at the top center of the display. However, this is shownby way of example only. The position of the visual cue 402 may be variedbased upon user preference or by the performance capabilities of theimage capture application. For example, the image capture applicationmay have the capability to detect white space in a document. In thisexample, the visual cue may be displayed within a white space of thedocument so as to not interfere with display of the text to be captured.If no white space is detected, the visual cue may be positioned at adefault location such as is shown in FIGS. 4A and 4B.

FIG. 5 depicts a sample process flow for acquiring an image of textusing the concepts and ideas as discussed above. A processing device,such as the mobile device 100, may obtain 500 image sensor resolutionand OCR engine performance as a function of character resolution forthat mobile device for use with an image capture application or atext-specific image-capture application as described above. Based uponthese values, the mobile device may compute 502 the minimum text sizefor accurate OCR performance. For example, the computed 502 minimum textcharacter size may be 16 pixels wide. However, to further ensure highquality, a larger minimum character width of, for example, 20 pixels maybe used.

The mobile device may also determine 504 the visual cue, including whatsize to make the visual cue such that, as the visual cue is displayed onthe screen of the mobile device, it is accurately sized to represent theminimum text size for high OCR performance and accuracy. A specific setof equations may be used to determine 504 the size of the visual cue.

Based upon the resolution of the image sensor, the size of an imagebeing captured may be represented as M1×N1, and the display screenresolution for the mobile device 503 may be represented as M2×N2.Typically, M1<<M2 and N2<<N1. Next a ratio, r=max(M2/M1, N2/N1) of thedisplay resolution and captured image resolution is computed. Minimumacceptable character size (e.g., 20 pixels as discussed above) may berepresented by L1. Then, the character width as shown on the mobiledevice screen may be defined as L2=r*L1. Based upon the result of thisequation, the mobile device may accurately determine 504 the size of thevisual cue to display on the mobile device screen.

A mobile device may receive 506 an image stream or video capture of aprinted document, and display the image stream on the display of themobile device. The image stream may be live view of the printed documentas it is currently aligned and oriented in relation to the mobiledevice. At or about the same time, the mobile device may superimpose 508the visual cue on the image stream. As described above, the user mayaccess a specific text capturing application that superimposes thevisual cue onto the image stream, or the standard image capturingsoftware installed on the mobile device may be configured to identifythat the user is taking an image of text, and automatically superimpose508 the visual cue onto the image stream. Alternatively, the user maychoose an option to display the text size guide during capture.

It should be noted that, as described above, the visual cue may be arepresentation of the minimum sized text the user should capture toensure that OCR accuracy remains acceptable. The visual cue may be atext string including appropriately sized characters (as shown in FIGS.4A and 4B), a geometric shape representing minimum character size, or adynamic cue that provides an indication (e.g., a visual change, soundcue or haptic notification) that the text being displayed in the imagestream is an acceptable size.

Optionally, the mobile device may determine 510 a suitable positionwithin the image stream for placing the visual cue in such a manner thatthe cue brings attention to the user and does not interfere with thetext content being captured. For example, the mobile device 510 maydetermine that there is a portion of white space within the image streamand superimpose 508 the visual cue at that position so as to notinterfere with the user's view of the text being captured. Additionally,the mobile device may determine 510 an orientation of the visual cuebased upon the position of the mobile device and the document beingcaptured. For example, as shown in FIGS. 4A and 4B, if the mobile deviceis placed in a landscape mode, the visual cue is rotated such that it islegible to the user. Alternatively, the visual cue may be positionedsuch that the visual cue is oriented in a similar manner to the textbeing captured, regardless of the position of the mobile device.

The mobile device may continually update the image stream as the usermoves the mobile device, and the user may compare 512 the size of thevisual cue and the text being captured. Alternatively, if the mobiledevice is using a dynamic cue, the mobile device may compare 512 thevisual cue and the text automatically by comparing the number of pixelsin the image stream text against the minimum number of pixels forproducing a high quality image for OCR accuracy.

If the mobile device is automatically comparing 512 the visual cue andthe document text, the mobile device may determine 514 if the mobiledevice is at an acceptable distance or zoom setting for capturing thetext.

If the processor does determine 514 that the mobile device is at anacceptable distance and zoom for capturing the text, the processor maynotify 516 the user by altering the visual cue (e.g., changing the colorof the visual cue), outputting a sound, causing a haptic feedback suchas vibration, or otherwise altering the output of the screen.

The user may then opt to capture 518 an image of the printed document.For example, the user may use an input device such as a button or anactive portion of the display to capture 518 the image. Alternatively,the mobile device may be configured to automatically capture 518 animage of the text. For example, once the mobile device determines thetext size in the image stream matches or exceeds that the minimum sizeindicated by the visual cue, the mobile device may automatically capture518 the image. Similarly, if the user holds the mobile device still fora period of time (e.g., 1 second) after the text size in the imagestream matches or exceeds the minimum size indicated by the visual cue,the mobile device may automatically 518 capture the image. Based uponthe available features of the mobile device and the image captureapplication, the user may have the option to select various imagecapture features such as automatic image capture, and select which ofthe automatic features are enabled prior to launching the application,or during normal operation of the application.

Once an image is captured 518, post-processing 520 may then be performedon the captured image. The post-processing 520 may include performing anOCR or other similar recognition algorithm, updating meta-dataassociated with the captured image, enhancing the quality of thecaptured image, and other similar post-processing techniques.

As discussed above, the visual cue display and text capture method andprocess as described above may be performed and implemented by anoperator of a mobile device. FIG. 6 depicts an example of internalhardware that may be used to contain or implement the various computerprocesses and systems as discussed above. For example, mobile device 100as discussed above may include a similar internal hardware architectureto that as illustrated in FIG. 6. An electrical bus 600 serves as themain information highway interconnecting the other illustratedcomponents of the hardware. CPU 605 is the central processing unit ofthe system, performing calculations and logic operations required toexecute a program. CPU 605, alone or in conjunction with one or more ofthe other elements disclosed in FIG. 6, is a processing device,computing device or processor as such terms are used within thisdisclosure. Read only memory (ROM) 610 and random access memory (RAM)615 constitute examples of memory devices.

A controller 620 interfaces with one or more optional memory devices 625to the system bus 600. These memory devices 625 may include, forexample, an external DVD drive or CD ROM drive, a hard drive, flashmemory, a USB drive or the like. As indicated previously, these variousdrives and controllers are optional devices. Additionally, the memorydevices 625 may be configured to include individual files for storingany software modules or instructions, auxiliary data, incident data,common files for storing groups of contingency tables and/or regressionmodels, or one or more databases for storing the information asdiscussed above.

Program instructions, software or interactive modules for performing anyof the functional steps associated with the processes as described abovemay be stored in the ROM 610 and/or the RAM 615. Optionally, the programinstructions may be stored on a tangible computer readable medium suchas a compact disk, a digital disk, flash memory, a memory card, a USBdrive, an optical disc storage medium, such as a Blu-ray™ disc, and/orother recording medium.

A display interface 630 may permit information from the bus 600 to bedisplayed on the display 635 in audio, visual, graphic or alphanumericformat. Communication with external devices may occur using variouscommunication ports 640. A communication port 640 may be attached to acommunications network, such as the Internet, a local area network or acellular telephone data network.

The hardware may also include an interface 645 which allows for receiptof data from input devices such as a keyboard 650 or other input device655 such as a remote control, a pointing device, a video input deviceand/or an audio input device.

The above-disclosed features and functions, as well as alternatives, maybe combined into many other different systems or applications. Variouspresently unforeseen or unanticipated alternatives, modifications,variations or improvements may be made by those skilled in the art, eachof which is also intended to be encompassed by the disclosedembodiments.

The invention claimed is:
 1. A method of providing a visual cue forimproved text imaging on a mobile device, the method comprising:determining, by a processing device operably connected to a mobiledevice, a minimum text size for optical character recognition (OCR) ofan image captured by the mobile device; receiving, by an image capturingdevice operably connected to the mobile device, an image stream of aprinted substrate of a document; comparing, by the processing device, asize of a visual cue and the minimum text size for optical characterrecognition; and displaying, on a display operably connected to themobile device, the image stream and the visual cue superimposed onto aportion of a white space of the document in the image stream, whereinthe size of the visual cue is based on the minimum text size.
 2. Themethod of claim 1, further comprising notifying, by the processingdevice, a user of the mobile device when text displayed within the imagestream is at least as large as the minimum text size.
 3. The method ofclaim 2, wherein the notifying comprises altering the visual cue,outputting a sound, causing a haptic feedback, or altering the imagestream.
 4. The method of claim 1, further comprising determining, by theprocessing device, a size of the visual cue based upon image capturingdevice resolution, display resolution, and OCR engine operationalrequirements.
 5. The method of claim 1, further comprising capturing, bythe image capturing device, a digital image of the image stream, whereinthe digital image does not include the visual cue.
 6. The method ofclaim 1, wherein the visual cue comprises a geometric shape or aninteractive cue.
 7. The method of claim 1, further comprisingdetermining, by the processing device, a position of the visual cuewithin the image stream based upon an analysis of the image streamcaptured by the capture device.
 8. The method of claim 7, wherein theanalysis of the image stream comprises minimizing interference betweenthe visual cue and printed content being captured, wherein minimizingthe interference comprises: determining a portion of the document thatis free of text content; and positioning the visual cue in the portionthat is free of text content.
 9. The method of claim 1, wherein thevisual cue comprises a bounding box.
 10. The method of claim 1, whereinthe visual cue comprises a text string.
 11. A mobile device comprising:a processing device; a display operably connected to the processingdevice and having an associated display resolution; an image capturedevice operably connected to the processing device and having anassociated image capture resolution; and a non-transitory computerreadable medium in communication with the processing device, thecomputer readable medium comprising one or more programming instructionsfor causing the processing device to: determine a minimum text size foroptical character recognition (OCR) of an image captured by the mobiledevice, receive an image stream of a printed substrate of a document ascaptured by the image capture device, compare a size of a visual cue andthe minimum text size for optical character recognition, and display theimage stream and the visual cue superimposed onto a portion of a whitespace of the document in the image stream on the display, wherein thesize of the visual cue is based on the minimum text size.
 12. The mobiledevice of claim 11, wherein the one or more instructions furthercomprise instructions for causing the processing device to output anotification to a user of the mobile device when text displayed withinthe image stream is at least as large as the minimum text size.
 13. Themobile device of claim 12, wherein the notification comprises alteringthe visual cue, outputting a sound, causing a haptic feedback, oraltering the image stream.
 14. The mobile device of claim 11, whereinthe one or more instructions further comprise instructions for causingthe processing device to determine a size of the visual cue based uponthe image capture resolution, the display resolution, and OCR engineoperational requirements.
 15. The mobile device of claim 11, wherein theone or more instructions further comprise instructions for causing theprocessing device to capture a digital image of the image stream,wherein the digital image does not include the visual cue.
 16. Themobile device of claim 11, wherein the visual cue comprises a geometricshape, a bounding box, or an interactive cue.
 17. The mobile device ofclaim 11, wherein the one or more instructions further compriseinstructions for causing the processing device to determine a positionof the visual cue within the image stream based upon an analysis of theimage stream captured by the capture device.
 18. The mobile device ofclaim 17, wherein the instructions that cause the processing device toanalyze the image stream further comprise instructions for causing theprocessing device to minimize interference between the visual cue andprinted content being captured.
 19. The mobile device of claim 18,wherein the instructions that cause the processing device to minimizethe interference comprise instructions for causing the processing deviceto: determine a portion of the document that is free of text content;and position the visual cue in the portion that is free of text content.20. The mobile device of claim 11, wherein the visual cue comprises atext string.